Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segment Replication - Allow shard idle when there are no replicas in index #7736

Closed
wants to merge 2 commits into from

Conversation

mch2
Copy link
Member

@mch2 mch2 commented May 24, 2023

Description

With Segment Replication enabled we generally disable shard idle. This change ensures there are replicas in the index before disabling shard idle with SR.

Related Issues

#7761

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@andrross
Copy link
Member

Nitpick on PR title/commit message. Should be "no replicas in index" not "...in cluster", right?

@mch2 mch2 changed the title Segment Replication - Allow shard idle when there are no replicas in cluster Segment Replication - Allow shard idle when there are no replicas in replication group. May 24, 2023
@mch2 mch2 changed the title Segment Replication - Allow shard idle when there are no replicas in replication group. Segment Replication - Allow shard idle when there are no replicas in replication group May 24, 2023
@mch2 mch2 changed the title Segment Replication - Allow shard idle when there are no replicas in replication group Segment Replication - Allow shard idle when there are no replicas in index May 24, 2023
@mch2
Copy link
Member Author

mch2 commented May 24, 2023

Nitpick on PR title/commit message. Should be "no replicas in index" not "...in cluster", right?

Yep you are right it should read index not cluster.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.search.backpressure.SearchBackpressureIT.testSearchTaskCancellationWithHighCpu
      1 org.opensearch.remotestore.RemoteStoreRefreshListenerIT.testRemoteRefreshRetryOnFailure

@codecov
Copy link

codecov bot commented May 24, 2023

Codecov Report

Merging #7736 (fb3574a) into main (4228075) will decrease coverage by 0.28%.
The diff coverage is 100.00%.

❗ Current head fb3574a differs from pull request most recent head 50b03b3. Consider uploading reports for the commit 50b03b3 to get more accurate results

@@             Coverage Diff              @@
##               main    #7736      +/-   ##
============================================
- Coverage     70.98%   70.70%   -0.28%     
+ Complexity    56714    56136     -578     
============================================
  Files          4722     4680      -42     
  Lines        267608   266079    -1529     
  Branches      39216    39074     -142     
============================================
- Hits         189962   188137    -1825     
- Misses        61589    61975     +386     
+ Partials      16057    15967      -90     
Impacted Files Coverage Δ
...in/java/org/opensearch/index/shard/IndexShard.java 70.09% <100.00%> (-0.93%) ⬇️

... and 683 files with indirect coverage changes

Copy link
Collaborator

@Bukhtawar Bukhtawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only concern I see with this is the bi-modal behaviour we are creating w/ and w/o replicas

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.search.backpressure.SearchBackpressureIT.testSearchTaskCancellationWithHighCpu
      1 org.opensearch.action.admin.cluster.node.tasks.ResourceAwareTasksTests.testTaskResourceTrackingDuringTaskCancellation

@andrross
Copy link
Member

The only concern I see with this is the bi-modal behaviour we are creating w/ and w/o replicas

Agreed. It's strange and unintuitive that a user could specify a value for index.search.idle.after and it will only take effect if there are no replicas. However, it is similarly strange that a seg rep index would behave differently from a doc rep index when there are no indexes (and therefore no replication is happening).

@mch2
Copy link
Member Author

mch2 commented May 25, 2023

The only concern I see with this is the bi-modal behaviour we are creating w/ and w/o replicas

Agreed. It's strange and unintuitive that a user could specify a value for index.search.idle.after and it will only take effect if there are no replicas. However, it is similarly strange that a seg rep index would behave differently from a doc rep index when there are no indexes (and therefore no replication is happening).

Agree on both points. Does documenting this clearly within Segment Replication docs alleviate some concern here? We could also add validation of replica count / replication strategy if index.search.idle.after is specified directly.

@andrross
Copy link
Member

I'm inclined to disallow the index.search.idle.after setting on a seg rep index in all cases. At a minimum I think we should disallow it when replicas are specified, as opposed to the current behavior where it is silently ignored.

Is there any precedent for index settings that are mutually incompatible?

@mch2
Copy link
Member Author

mch2 commented May 25, 2023

I've moved this to a new issue with benchmarks for the zero replica case. The original two linked issues are not specific to the zero replica case.

I'm inclined to disallow the index.search.idle.after setting on a seg rep index in all cases. At a minimum I think we should disallow it when replicas are specified, as opposed to the current behavior where it is silently ignored.

I am on board with this, but I think only for the case where replicas are specified. The benchmark i've included in the linked issue to me is significant enough of a hit to warrant this special case.

Is there any precedent for index settings that are mutually incompatible?

I think only from a validation perspective such as index.replication.type of document with index.remote_store.repository enabled.

mch2 added 2 commits June 16, 2023 14:10
… enabled.

This change updates search idle setting to not be configurable on segrep indices with replicas.

Signed-off-by: Marc Handalian <[email protected]>
@mch2
Copy link
Member Author

mch2 commented Jun 19, 2023

@andrross Made an update here to add validation to search.idle.after setting that prevents configuring this setting when SR is enabled with replicas. It will allow configuration of the setting with segrep only if no replicas are configured. I'm not in love with this special replica count behavior but I think it would be a poor experience to not be able to change from the default 30s value when there are no replicas.

@mch2
Copy link
Member Author

mch2 commented Jun 19, 2023

Closing this for now, I don't like the added dependency on IndicesService in IndexSettings.

@mch2 mch2 closed this Jun 19, 2023
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@mch2
Copy link
Member Author

mch2 commented Jun 19, 2023

Also the test to create an index with 0 replicas, and then add a replica results in flaky failures in validation. applyClusterState will run async and fail validation again and prevent the update.

This means user would have to unset the setting / set the value to 0 before adding a replica.

This was a bug in the validation when the value was left at default. Adding more context to issue...

@mch2
Copy link
Member Author

mch2 commented Jun 20, 2023

oops, I rewrote the history here and can't re-open - opened it again under #8173

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants